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^ Abstract 

As was initially shown by Brent, exponentials of truncated power series can be 
^ computed using a constant number of polynomial multiplications. This note gives 

a relatively simple algorithm with a low constant factor. 
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Let K be a ring of characteristic zero and let h be in with /t(0) = 0. 

^ The exponential of h is the power series 



exp(/i) = 

i>0 



Computing exponentials is useful for many purposes, such as solving differ- 
ential equations [4] or recovering a polynomial from the power sums of its 
roots [11]. 

Using Newton iteration, it has been known since Brent's work [3] that expo- 
nentials could be computed for the cost of polynomial multiplication, up to a 
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constant factor. Following this original result, a series of works aimed at low- 
ering the multiplicative factor; they all rely on some form of Newton iteration, 
either of order 2 (the "usual" form of iteration) or of higher order. Remark 
that the question of improving constant factors can be asked with other appli- 
cations of Newton iteration (power series inversion, square root, . . . ) [12,2,6], 
but we do not discuss those here. 

As is customary, we assume that the base ring K supports the Fast Fourier 
Transform (as an aside, note that in the Karatsuba multiplication model, 
exponential computation has an asymptotic cost equivalent to that of multi- 
plication [7, § 4.2.2]). If m G N is any power of 2, we suppose that K contains 
a mth primitive root of unity Um such that in addition Um = i^im! also, 2 is 
a unit in K. We denote by E(m) an upper bound on the cost of evaluating a 
polynomial of degree less than m at the points (1,0;^, . . . ,c<;™~^). Using Fast 
Fourier Transform, we have E(m) e O(mlogm); we also ask that E satisfies 
the super-linearity property E(2m) > 2E(m). 

Theorem 1 Let h e K[[a;]], with h{0) = and let n e N be a power of 2. 
Then, starting from uin and from the first n coefficients of h, one can compute 
the first n coefficients of ex.p{h) using 16|E(n) -|- 24|n operations in K. 

Using Fast Fourier Transform, polynomials of degree less than n can be mul- 
tiplied in 3E(2n) +0{n) operations. Hence, we say that an exponential can be 
computed for (essentially) the cost of 2| multiplications. References to pre- 
vious work given below use the same ratio "cost of exponential vs. cost of 
multiplication" . 

As documented by Bernstein [1], the initial algorithm by Brent had cost 7| 
times that of multiplication. Bernstein successively reduced the constant fac- 
tor to 3 1 and 2| [2] using high-order iterations. Recently, van der Hoeven [10] 
obtained an even better constant of 2|. However, that algorithm (using a 
high-order iteration) is quite complex (to wit, the second-order term in the 
cost estimate is likely not linear in n); we are not aware of an existing imple- 
mentation of it. 

As to ordcr-2 iterations, Bernstein [2] obtained a constant of 3|, which was 
superseded by Hanrot and Zimmermann's 3| result [6]. The merits of our 
algorithm is thus to be a simple yet faster second order iteration. Compared 
to van der Hoeven's result, we are asymptotically slower, but we could expect 
to be better for a significant range of n, due to the simplicity of our algorithm. 

Proof. For a = Y.i>o<^iX^ ^ ^[W]) write a mod = Y.iZo'^iX^ and 
a div = J2i>o di+ex'", computing these quantities does not require any arith- 
metic operation. In Figure 1, we first give the standard iteration (left), taken 
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from Hamot and Zimmermann's note [6], followed by an expanded version 
where polynomial multiplications are isolated (right). Correctness of the left- 
hand version is proved in [6]; in particular, each time we enter the loop at 
Step 2, / = exp(/i) mod x"^ and g = l/f mod x"*/^ hold. 





Exp(/i, n) 




v. 


f = ^,g = l,m = 1 


Exp(/i, n) 


2'. 


while m < n/2 do 


1- f = ^,9 = ^,m = l 


2.a' 


= (2(? — fg"^) mod x"* 


2. while m < n/2 do 


2.b' 


q = h' mod a;™"^^ 


2. a g = {2g — fg"^) mod x'^ 
2.b q^h' mod x'""^ 


2.c' 


r = fq mod (x™ — 1) 


2.d' 


s = — r) mod (x™ — 


2.0 w — q-\- g{f' — fq) mod x'^'^" 


■1 2.6' 


t — gs mod x"* 


2.d f = f + f{h- J w) mod x"^"^ 


2.f 


u — (h mod x^"* — / 


2.C m = 2m 


2.g' 


V = fu mod 


3. return / 


2.h' 


f = f + x'^v 



2.i' m — 2m 
3'. return / 



Fig. 1. Two versions of the exponential computation 

To prove the correctness of our version, it is enough to show that it computes 
the same output as the original one. When entering Step 2 we have / = 
exp(/i) mod x"^; it follows that x{f' — qf) — mod x"^, with q = h' mod x"^~^. 
Since x{f' — qf) has degree less than 2m, we deduce that the quantity s of 
Step 2.d' satisfies x{f' — qf) = x^s. This implies that t = gs mod x"^ satisfies 
fj^m-i _ gi^ji _ mod a;^™"^^, so that the quantities w of Step 2.c and u 
of Step 2.f satisfy u — {{h — J w) mod x^"*) div x^. The original iteration 
satisfies h — J w = mod x'^, so that actually x"^u = {h — J w) mod x^"* and 
thus x"^v — f{h — J w) mod x^"^, with v — fu mod x"^. The correctness claim 
follows. 



For / in K[x] and m a power of 2, we define 

DFT(/,m) = (/(I), . . . ,/«-^)), DFT'(/,m) = (/(a;2j, . . . , /(a;2^<-^)), 

so that DFT(/, 2m) is, up to reordering, the concatenation of DFT(/, m) and 
DFT'(/, m). Recall that if / has degree less than m, then DFT(/, m) can 
be computed in time E(m); besides, DFT'(/, m) can be computed in time 
E(m) + 2m (due to the scaling by U2m)] the inverse DFT in length m can be 
performed in time E(m) + m (due to m divisions by m). 

With this, we finally analyze the cost of the algorithm step by step. We assume 

that the n elements {l,ujn, ■ ■ ■ ,^n~^) have been precomputed in time n once 
and for all, and stored, so that they are freely available during the remaining 
computations. The hypothesis cUm = t^im ensures that all the needed DFT's 
solely use (part of) these n elements. 
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In what follows, we assume m is a power of 2, with m > 2, so that m/2 is 
an integer. Recall that at the input of Step 2, / has degree at most m — 1 
and g has degree at most m/2 — 1; additionally, we suppose that DFT{g,m) 
is known. Then, the key ingredients are as follows: 

(1) We will compute DFT((?,2m); since DFT{g,m) is already known, it is 
enough to compute DFT'(5',m), which saves a factor of 2. 

(2) Since x{f' — qf) = x"^s, we can compute it modulo x"* — 1. 

Step 2.a' This step updates g to 1/f mod x'^. The product fg'^ has degree 
less than 2m; it is computed by FFT multiplication in length 2m. Since 
DFT{g,m) is known, we do not need to compute DFT(5f,2m) but only 
DFT'(^,m). Hence, the cost is E(2m) (DFT of /) + E(m) + m (DFT' of g) + 
Am (pairwise products) + E(2m) + 2m (inverse DFT). 

By the fundamental property of Newton iteration, the first m/2 — 1 coef- 
ficients of g and 2g — fg"^ coincide. Hence, to deduce 2g — fg"^ mod x"*, only 
m/2 sign changes arc needed. 

Step 2.b' Differentiation takes time m; since half of the coefficients were com- 
puted at the previous loop, the cost can be reduced to m/2. 

Step 2.c' We compute r by FFT multiphcation in length m. Since 
DFT(/, 2m), and thus DFT(/, m), is known, the cost is 2E(m) + 2m. 

Step 2.d' Computing f'—r takes time 2m; multiplication by x modulo x^ — l 
is free. 

Step 2.e' The product gs has degree less than 2m; it is computed by FFT 

multiphcation in length 2m, of cost 3E(2m)+4m. This provides DFT(g', 2m), 

which will be used as input in the next iteration. 
Step 2.f Integration and subtraction together take time 2m. 
Step 2.g' The product fu has degree less than 2m; it is computed by 

FFT multiplication in length 2m. Since DFT(/, 2m) is known, the cost 

is 2E(2m) + 4m. 
Step 2.h' This step is free. 

Hence, the cost of one pass through the main loop is at most 3E(m) + 7E(2m)-|- 
22m. At the last iteration, with m = n/2, savings are possible at Step 2.e', 
since we do not need to precompute DFT(g(,2m) for the next iteration. To 
compute t = gs mod x"^, we write 

g^go + x'^/^gi, s = so + x"'^'^si, t = goSo + x'^''^{gQSi + giSo) mod x'^. 

We compute g^so and g^si -\- giSo by FFT's of order m. Since DFT(g'o, tti) is 
known, we just need to compute DFT((yfi, m), DFT(so, itt-) and DFT(si, m), as 
well as 2 inverse DFT's, for a cost of 5E(m) + 2m; the other linear costs (inner 
products and additions) sum up to 4|m. Adding all costs gives the claimed 
complexity result in Theorem 1. 
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The case of arbitrary n. We gave our algorithm for n a power of 2 (the 
algorithm of [6] does not have this restriction, but assumes that Fourier trans- 
forms can be performed at arbitrary lengths n). We describe here possible 
workarounds for the general case. 

For an arbitrary value of n, Newton iteration will compute the approximations 
exp(/i) mod a;™% where the sequence (mj)j>o is defined by r = [log2(n)] and 
rrii = fn/2'""*], as in [5, Ex. 9.6], so that rrii is either 2mj_i or 2mj_i — 1 
and thus m^-i = [mi/2]. Then, the algorithm enters Step 2 knowing / = 
exp(/i) mod x'^' and g — 1/f mod x^^-'^; it exits Step 2 with / = exp(/i) mod 
^mi+i g — xj J mod x^^ . Depending on the Fourier Transform model we 
use, our improvements can be carried over to this case as well. 

In a model which allows Fourier transforms at roots of unity of any order, 
our algorithm extends in a rather straightforward manner. As before, we also 
suppose that DYT{g, rrii) is known at the beginning of Step 2, where now DFT 
can be taken at arbitrary order. Now, the multiplications at Steps 2. a', 2.c' 
and 2.g' are done with transforms of order respectively 2mj, rrii and 2mj, 
but that of Step 2.c' has order rrii+i to enable the next iteration. This gives 
exp(/i) mod x^"**, and thus exp(/i) mod x'^^+^, by truncating off the last coef- 
ficient in the case where mj+i = 2m j — 1. 

In a model where only roots of unity of order 2^^' are allowed, it is possible to 
use van der Hoeven's Truncated Fourier Transform [8]. For / e K.[x] of degree 
less than m, let TFT(/, m) denote the values . . . , where 

r = [log2(m)] , a; is a primitive root of unity of order 2^, and [i]r is the bitwise 
mirror of i in length r. 

A first difficulty is that the relationship between TFT(/, m) and TFT(/, 2m) 
is less transparent than in the case of the classical Fourier transform. Step 2. a' 
requires to compute only the values TFT(/, 2m) — TFT(/, m); while it is ob- 
viously possible to adapt van der Hoeven's algorithm to this case, as in [9, 
§ 5], determining the exact cost requires a specific study. A second issue is 
that using the values TFT(/, m) does not allow immediately to perform mul- 
tiplication modulo x"^ — 1, which is needed to compute s at Step 2.d' of our 
algorithm. However, this problem can be solved by computing s/x'^, which is 
a polynomial of degree less than m (remark that the same issue arises if one 
wants to use the Truncated Fourier Transform in the algorithm of [6]). 



Experiments. Figure 2 gives empirical results, using the FFT routines for 
small Fourier primes implemented in Shoup's NTL library [13]. As can be 
seen, a ratio close to the expected 2.75 is observed. 
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Fig. 2. Ratio exponential vs. product 
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